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Abstract 

Structural properties posses valuable information about 
the formation and evolution of galaxies, and are im¬ 
portant for understanding the past, present, and future 
universe. Here we use unsupervised machine learning 
methodology to analyze a network of similarities be¬ 
tween galaxy morphological types, and automatically 
deduce a morphological sequence of galaxies. Appli¬ 
cation of the method to the EFIGI catalog show that 
the morphological scheme produced by the algorithm is 
largely in agreement with the De Vaucouleurs system, 
demonstrating the ability of computer vision and ma¬ 
chine learning methods to automatically profile galaxy 
morphological sequences. The unsupervised analysis 
method is based on comprehensive computer vision 
techniques that compute the visual similarities between 
the different morphological types. Rather than relying 
on human cognition, the proposed system deduces the 
similarities between sets of galaxy images in an auto¬ 
matic manner, and is therefore not limited by the num¬ 
ber of galaxies being analyzed. The source code of the 
method is publicly available, and the protocol of the ex¬ 
periment is included in the paper so that the experiment 
can be replicated, and the method can be used to ana¬ 
lyze user-defined datasets of galaxy images. 
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1 Introduction 

In the past few years, advancements in computa¬ 
tional tools and algorithms have started to allow au¬ 
tomatic analysis of galaxy morphology. Approaches 
to automatic galaxy classification include model-driven 


methods such as GALFIT (Peng et al. . 2002h. GIM2D 


(ISimardL 119991: ISimard et aLl 1201 Ih . CAS (IConselice . 


2003), Ginil Abraham et aTT 2003). Ganal yzer (jShamii 
2011 ). and SpArcFiRe ( Davis and Haves . 2014h . Data- 
driven methods include binary classifiers that can 
differentiate between broad galaxy morpho l ogical 


types of elliptical and spiral galaxies dShamiii l2009t 


Meneses Cuadros et ah . 2009t Banerii et al. . 2010l) . but 

also classifiers that can differentiate between four ba¬ 


sic objects (lAbd Elfattah et al.Ll2013h . classification be 


tween four basic Hubble morphological t ypes of E, 
SO, Sab, and Scd ( Huertas-Companv et al. . 20101) . and 
comprehensive analysis of galax y images that i nclude 


specific morphological features (Baillard et al. . 2006L 


Kuminski et all 20141: Dieleman et al. . 20L5h . Class! 


fication of galaxie s can also be per formed using spec 


tra in supervised (iBall et al 

( Almeida et al. . 2010l) mamii 


20041) and unsupervised 


While supervised machine learning have demon- 


galaxies by their morphological types (Shamii, 

2009 

Meneses Cuadros et al.. 

20091: Banerii et al.. 

2010 

Huertas-Comnanv et al., 2010). discrete classifiers do 


not effectively conceptualize the continuous nature of 
galaxy morphology, and therefore galaxy morpholog¬ 
ical schemes are still defined by manual observation. 
One of the earlier an d most widely used sch e mes i s 
the Hubble sequence ( Hubblei 193^ Sandage . 1961 ). 
which is a commonly used morphology classification 
scheme that covers the morphology of most known 
galaxies. Hubble’s initial work proposed a morphology 
classification system based on attributes of observed 
nebulae, originally consisting of three main morpho¬ 
logical types, commonly known as ell iptical (E), nor ¬ 
mal spirals (S) and ba rred spirals (SB) ( Hubblei 19361) . 
Humason et al. ( 19561) revisited the Hubble Sequence, 
introducing lenticular galaxies (SO), creating what is 


1 































































































































most c ommonly known as the Hubble “tuning-fork” di¬ 
agram ( De Vaucouleursl 1959h . It should be noted that 
although irregular (I) galaxies were recognized by Hub¬ 
ble, they were not included in Hubble’s classification 
scheme si nce at the time they cou ld not be distinctively 


classified ( De Vaucouleurs , 19591) . 


Since the Hubble morphological scheme was intro¬ 
duced, several modifications and enhancem ents have 
been proposed. Morgan and Mavall ( 1957h . proposed 
a galaxy classification scheme based on the spectra, 
showing the correlation between the spe ctra and the spi - 
ral structure and spectral concentrati on dMorgan 19^), 
and i d entified the cD pheno mena dMorgan and Lesh , 


1965 ). van den BerghI ( 1965) proposed a classification 


ies do not feature an increase in flattening of the 
galaxies, and that normal spiral ga laxies and SO 


galax i es form two pa r allel s eq uences dSandage et al 


1970 : Van Den Bergh . 19761) . Kormendv and Bender 
J 19961) expanded the Hubble classification scheme 
with a more deta iled morphological analysi s of el - 


lintical salaxies (Kormend' 

/ and Dioreovski, 

Kormendv and Bendei ( 

1996 

) proposed some 


19891). 


cations to the Hubble sequence, including the two- 
component SO galaxies, and the addition of the Mag¬ 
ellanic irregulars. 

One of the notable refinements and extensions to 


the Hu bble sequence was proposed by lDe Vaucouleurs 
dl959h . proposing a three dimensional system. This 
classification included the four main broad morpholog¬ 
ical classes of elliptical, lenticular, spiral, and irregular 
galaxies along a linear main axis from galaxy types E to 
Im, including Hu bble’s init i al a, b , c representation for 
’’early” to ’’late”. ISandagel dl96ll) refinement included 
d for ’’very late” and the division of SO galaxies into 
SO“, SO°, SO+, as well as the inclusion of ”m” for 
magellanic gala xies: E, E+, S0~ , S0°, S O"*", Sa, Sb, Sc, 
Sd, Sm, or I„i (IDe Vaucouleursl 119941) . The classifi¬ 
cation was also extended to include intermediate stages 
between the initial a, b, c, d, and m stages such as ah, be, 
cd, and dm. This scheme introduced a notation based on 
family, variety, and stage, with family representing the 
absence of bars in a spiral galaxy (A), the presence of 
bars (B), or a transition of the two (AB). Variety rep¬ 


system of late-type galaxies based on luminosity, driven 
by the correlation between absolute luminosity and the 
shape of the spiral arms. That work was followed by 
a galaxy classification scheme of spiral and SO galax¬ 
ies, and distinguished “earl y” and “late” type sys tems 
b y their dis k -to-bu lge ratio ( Van Den BerghI 19761) . 

Sandage ( 196 ih showed that SOi to SO 3 galax¬ 


resents the presence of a ring shape (r), spiral shape 
(s), or transition of the two (rs) within spiral galaxies, 
and stage represents the galaxy position along the main 
axis. Another feature of this classification scheme was 
assigning each stage along the main axis a numerical 
integer value between -6 and 11. E galaxies being rep¬ 
resented by the values -6 to -4, lenticular -3 to -1, spri- 
als 0 to 9, and irregulars 10 to 11 for a more quantitative 
approach to the classification. Eurther ing the quantita¬ 
tive ap proach to galaxy classification, IPe Vaucouleurs 
(Il994h also introduced measurable parameters show¬ 
ing either a consistent mean increase or decrease along 
the current classification sequence. Characteristics in¬ 
clude bulge-to-disk ratios, integrated luminosity in the 
B-band, the ratio of aperture diameters, total or effective 
mag nitudes, mean surface b rightness, and hydrogen in¬ 
dex ( De Vaucouleursl 1994 ). 

'While proposing a quantitative scheme, the associ¬ 
ation of a galaxy to a morphological type is subjec¬ 
tive, and the annotations of two or m ore astronomers are 
not necessarily identical i n all cases ( Naim et al. . 19951 : 


de Lapparent et al. . 201 ih . It has been therefore pro¬ 


posed that galaxy morphology classi fication schemes 
will i nvolve computational methods ( De Vaucouleursl 
19941) . Here we perform automatic unsupervised analy¬ 
sis of galaxy images of different morphological types to 
produce a computer-generated galaxy morphology se¬ 
quence. The scheme is based on quantitative computer 
analysis of thousands of annotated galaxy images, pro¬ 
ducing a network of similarities between the morpho¬ 
logical types that is independent of the human percep¬ 
tion and the way humans quantify the similarities be¬ 
tween these types. 


2 Data 


The d ata used in the s tudy are taken from the EE I GI cat 


alog (IBaillard et al.L 1201 It Ide Lapparent et al.L 1201 Ih . 


which was compiled for the purpose of developing 
and testing computational methods related to galaxy 
morphology. The catalog contains image data as 
well as morphological annotation data of 4458 galax¬ 
ies taken from PGC (Principal Galaiesy Catalogue), 
also included in SDSS (Sloan Digital Sky Survey) 
Data Release 4. Among other morphological fea¬ 
tures, each galaxy was assigned with i ts morphologi 


cal ty pe determined by 10 astronomers (IBaillard et al. 
^l.l ) based on the updated RC3-based Hubble types 
dPe Vaucouleurs et al.L Il992h . Other morphological 
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features include the bulge, spiral arms, as well as other 
features such as texture, appearance in the sky, and en¬ 
vironment. 

EFIGI conta ins images of each g alaxy in the u, g, i. 


function (IGrigorescu et al.Ll2002ll . 


r, and z bands dBaillard et all 1201 Ih . To produce color 


mat using the STIFF software (iBaillard et al.Ll201 Ih . 


142 samples, except for cE (-6), cD (-4), and dE (11), 
which only had 18, 44, and 69 samples, respectively. 
For their small size, these classes were not used in the 
experiment. 


3 Image analysis method 

The image an alysis method used in the exp e riment 


is Wndchrm (Sham 

ir et al. 

2008a; 

Shamir et al.. 2009bl 

2010al 

2013al). 


Shamir. 2008 


the co -occurrence matrix of the image (iHaralick et al 


1973h . mea sured using 28 image descriptor values as 
described in Shamir et al. ( 2008ai) . 

2. Tamura textures; Contrast, dire ctionality and 
coarseness of the Tamura textures (ITamura et al 


images, the i, r, and g bands were combined to provide a 
composite RGB image, such that gamma correction of 
1.3 was applied to the luminosity, and color saturation 
was increased by a factor of 2. The color images were 
converted to the PNG (Portable Network Graphics) for¬ 


Polynomial decomposition: __ 

1. Radon transform features ( Fim . 199(]h : Four series 

computed for angles 0, 45, 90, 135 degrees, and then 
convolved into a 3-bin histogram, providing a total of 
12 numerical content descriptors^ _ 

2. _ Chebyshev Statistics (iGradstevn and Rvzhik . 

1994h : A 32-bin histogram of a 400-bin vector pro¬ 


The EFIGI color images were converted to 255 x 255 
color 24-bit TIFF (Tagged Image File Format) images 
using ImageMagick, and were separated into folders 
such that each folders contained galaxies of the same 
type as annotated by EFIGI. Images of the same galax¬ 
ies in the u, g, r, i, and z bands were converted to 
monochrome TIFF, and were used without color infor¬ 
mation. 

The galaxy types are based on the numerical scheme 
( De Vaucouleursl 1959h taken from the EFIGI catalog 
( Baillard et al. . 201 ih . Each galaxy type had at least 


duced by the Chebyshev transform of the with order of 
N=20. 

3. Zernike features; Absolute values of the 72 
coefficients of the Zernike polynomial approximation 
dTeaguelfl^ . 

4. Chebyshev-Fourier features; A 32-bin histogram 
of the poly nomial coefficients of a Chebyshev-Fourier 
transform dOrlov et al. . 2008 ) with maximum polyno¬ 
mial order of N=23. 

High-contrast features: 


1. Fractal features, as described in dWu et allll992h . 


that has a feature 
set of 4027 numerical image content descriptors, or 
2885 numerical descriptors when color information is 
not used. The numerical image content descriptors are 
the following; 

Texture features: 

1 . HaraUck textures: Energy and entro py computed on 


19781) . The coarseness descriptors are its sum and its 
3-bin histogram, proyiding a total of six numerical 
content descriptors. 

3. Gabor Filters: Gabor biters ( Gaboi . 1946h using 


2. Edge features; Mean, median, yariance, and 8-bin 
histogram of the magnitude and direction computed 
on the Prewitt gradient of the image, as well as edge 
direction homogeneity. 

3. High-contrast object statistics: Minimum, max¬ 
imum, mean, median, yariance, Euler number, and 
10-bin histogram of the objects areas computed on the 
8-connected objects found in the Otsu binary transform 
of the image. 

Pixel statistics; 

1. Multi-scale Histograms; Four histograms with 
3, 5, 7, and 9 bins comput ed on the pixel intensities 
( Hadiidemebiou et al.L 2001 ). 

2. First 4 Moments; Mean, standard deyiation, 
skewness, and kurtosis computed on image ’’stripes” in 
four different directions (0, 45, 90, and 135 degrees). 

These features are extracted not just from the raw 
yalues, but also from the two-dimensional transforms 
and combinations of multi-order transforms. The trans¬ 
forms are Fourier transform, Chebyshey transform, 
Wayelet (syml et 5, ley el 1) transform, color transform 
( Shamil! l2006l) . and edge magnitude transform. A de¬ 
tailed description and performance analysis of the im¬ 
age features extracted from image transforms and multi¬ 
order transforms can be found in dShamir et ah ^ 2008a ; 


seyen frequencies (1 through 7) and Gaussian harmonic 


Shamiiil2008HShamdr et alll2009b[l2010all2013ah . 

The comprehensiye nature of the numerical image 
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content descriptors allow s analyzing complex m orphol- ( Felsenstein . 1993 : Kuhner and Felsensteinl 1994 ). 


ogy such as radiology ( Shamir et al. . 2009bllal). 


crosc opy (IShamir et al 



mi- 


2008bl Manning and Shamii 

(Shamir et al., 

201 Oat 

Shamil 


In particular, the Wndchrm feature set has been 
proved to be informative for analysis of galaxy mor¬ 
phology, and was found usefu l for tasks such as galax\ 
classification ( Shamii , 20091 : 


Kuminski et al 



and automatic detection of pecuhar galaxies (IShamiii 

2012at Shamir and WalliriL 2014t Shamir et al. . 2014al) . 


I20F 


A complete and detailed description of the set of nu- 


mance analvsis is available in (Shamir et al.l 

2008a 

Orlov et al.. 2008: Shamiil 2008: 

Shamir et al.. 

2010a 


and the source code is also publicly avail- 
able through the As trophysics Source Code Library 


( Shamir et al. . 2013bh . 


As mentioned in Section[T] the purpose of the method 
is not to automatically classify galaxies by their mor¬ 
phology, but to quantitatively deduce a network of sim¬ 
ilarities between the different morphological types us¬ 
ing merely the galaxy images, and without using meta¬ 
data or existing knowledge that is not in the image con¬ 


tent. The unsupervised analysis (IShamir et al. 


Shamir and Tarakhovskvl 2012 : Shamir et al 


2010a 


2013a 


with its Fisher discriminant score ( Bishop et al. . 20061) 


with the lowest Fisher sc ores, are rejected (IShamir 
20091:IShamir et al.ll2009bll3) . 


(Shamir et al. 

2008aL 

2010a: Shamir and Tarakhovskv, 

2012: Shamir etal.1 

2013a). The similaritv matrix is 


which was originally developed for visualizing simi¬ 
larities between organisms by their genotypes, but in 
this experiment used to visualize the similarities be¬ 
tween galaxy types. It is used with randomized in¬ 
put order of sequences where 97 is the seed, 10 jum¬ 
bles, and the Equal-Daylight arc optimization. When 
pairs of nodes are added, new nodes are created to pro¬ 
vide the optimal tree that represents the similarity ma¬ 
trix. PHYLIP first creates the tree in the form of a text 
file that follows the Newick format, and then visual¬ 
izes it by using the DRAWTREE program. The edges 
between the nodes reflect the degree of similarities be¬ 
tween them, such that a shorter path between two nodes 
reflects a higher similarity between the images of these 
two classes. DRAWTREE automatically sets the angles 
such that the tree is convenient and easy to read. In the 
phytogeny created by PHYLIP each pair of nodes has 
just one possible path between them, and the length of 
the path includes all segments on that path, including 
edges between nodes added by PHYLIP during the tree 
optimization process. 

The method used to compute and visualize the sim¬ 
ilarities between the galaxy types is described in de¬ 
tails in dS hamir et ^kl 201 Oat Shamir and Tarakhovskv , 


works by first allocating 140 galaxy images from each 
galaxy type as annotated by EEIGI to the training set, 
and assigning each numerical ima ge content descriptor 


2012 ; Shamir et al. . l2014bl) . and was used for unsuper 
vised analysis of sim ulated images of galaxy mergers 


(IShamir et al.L l2013al) . It also demonstrated its ability 


computed using the training samples. After the content 
descriptors were ranked based on their Eisher discrimi¬ 
nant scores, the 85% of the least informative features. 


to profile continuous biomedical processes in which the 
clinical stages are reflected by i mage morphologies tha t 


change on a continuous scale (IShamir et al 


2010b). 


The similarity between each pair of galaxy images 
is then computed u sing the Weighted Nearest Di stance 
(WND) algorithm ( Shamir et al. . 2008a . 2010ah . The 


mean similarity between all test galaxies of type ti and 
all training galaxies of type t 2 determines the simi¬ 
larity between these two galaxy morphological types. 
The similarities between all pairs of galaxy types pro¬ 
duce a similarity matrix, normalized such that the 


computed 20 times such that in each run different im¬ 
ages are randomly allocated to training and test sets, and 
the final similarity matrix is generated by averaging the 
20 similarity matrices. 

The similarity matrix is visualized by PHYLIP 


Detailed instructions including specific command lines 
used to produce the results are described inlAl 


4 Results 

The application of the similarity estimation method de¬ 
scribed in Section [3 to the EEIGI color image data de¬ 
scribed in Section |2] produced the phylogeny displayed 
by Eigure[T] 

As the figure shows, the network of similarities be¬ 
tween the galaxy morphological types computed by the 
algorithm is in agreement wi t h the ordered sequence 
proposed by De VaucouleursI ( 1959h . The algorithm 
produced a graph starting with the ellipticals (-5), fol¬ 
lowed by the lenticualr galaxies (-3 through -1). Then, 
continuing sequentially are the spiral galaxies from (1 
through 9) followed by the ir regulars (10) in an orde r 
with perfect agreement with ( De VaucouleursI 1959h . 
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Figure 1: The network of similarities between the galaxy morphological types as deduced automatically by the 
algorithm 


As mentioned in Section |2] the cD (-4), cE (-6), and 
dE types (11) were not included in the analysis due to 
the insufficient amount of sample images of these types 
in EElGl. The probability that 15 elements are ordered 
in an ascending or descending order by mere chance is 
^ =~ 1.53-10-12. 

In another experiment we tested the method using the 
color images converted to gray-scale, and normalized 
for intensity such that all images had me an pixel value 
of 100, and standard deviation of 25 (IShamir et al 


2008ah . The normalization ensured that the order will 


be determined by the shape, with no impact of color or 
brightness. The resulting graph produced by the algo¬ 
rithm is displayed in Figure |2] 



Figure 2: The network of similarities between the 
galaxy types using normalized gray-scale images 


As the figure shows, the analysis of the normalized 


gray-scale images provided results similar to the graph 
produced using the color images, showing that the order 
was not necessarily driven by pixel intensity or by the 
color. The random chance probability that 12 elements 
out of 15 are ordered in ascending or descending order 

It is also noticeable that the SO galaxy types SO” (-3), 
S0° (-2), and SO^ (-1) do not follow t he numerical order 
proposed by iDe VaucouleursI (Il959h . That analysis of 
the computer is in agreement with the observation that 
SO”, S0°, and S0+ galaxies do not feature an i ncrease 
in the flattening of the galaxies (ISandageill96lh . 

Figure [3 shows the Fisher discriminant scores of the 
groups of numerical image content descriptors, reflect¬ 
ing the measured informativeness of the descriptors and 
consequently their impact on the analysis. The descrip¬ 
tors are extracted from the image transforms and multi¬ 
order transforms. 

As the figure shows, the identification of the Hubble 
stage depends on numerous image content descriptors 
working in concert. The fractal features were the most 
informative descriptors, indicating that the fractality of 
the galaxy is different across different galaxy morpho¬ 
logical types. This agrees with the observation that frac- 
tality can be u s ed as a galaxy classification signature 
( Lekshmi et al. . 2003h . and can assist i n differentiatin g 
between elliptical and spiral galaxies ( Shamiil 20091) . 
For instance, an elliptical galaxy has low fractality in 
the absence of complex shape, but the fractality of a 
galaxy should become more dominant when the galaxy 
has more arms and split arms. 

The graph shows that many other numerical im- 
age content descript ors such as Haralick textures 
(IHaralick et al.l 119731) have an impact on the analy- 
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Fisher discriminant score 



Image features 


Figure 3: Fisher discriminant scores of the different groups of numerical content descriptors, extracted from the 
different image transforms 


sis, and work in concert to quantify the similarities be¬ 
tween the different galaxy morphological types. Tex¬ 
ture features has been shown to be informative in sep¬ 
arating b etween ga l axies based on their morphologi¬ 
cal types (A^ 2006Hshamii . 2009 : Banerii et al. . 2010t 


Pedersen et al. . 20131) . For instance, texture homogene¬ 

ity/entropy may change as the galaxy becomes more 
sparse, a nd the texture also co rrelates with star forma¬ 
tion rate ( Pedersen et al.L 2013 ). 


On the other hand, several numerical content descrip¬ 
tors did not show substantial difference between galax¬ 
ies of different Hubble stages. For instance. Radon fea¬ 
tures do not show a change between different galaxy 
types, as well as Tamura textures. The weak ability of 
Tamura textures to differentiate between galaxy types 
is that the directionality can be offset by galaxies or 
arms rotating to the opposite direction. That is different 
from other texture analysis algorithms such as Haralick, 
where the texture entropy and energy are independent of 
the direction. 


The experiment was also repeated with the EFIGI 


galaxy images of the u, g, i, r, and z bands. The re¬ 
sulting phylogenies are displayed by Figured 


As the figure show s, the order of the gala xy types 
somewhat violates the De Vaucouleurs ( 19591) scheme. 
The shorter segments between some of the galaxy types 
show higher similarity deduced by the method, indicat¬ 
ing that in some cases the algorithm could not identify 
the differences between these types. That shows that 
although the order of the galaxy types deduced by the 
algorithm i s largely in agreement w ith the sequence de¬ 
scribed in ( De Vaucouleursl 1959h . processing just one 
band leads to loss of information, and consequently the 
order and automatic plac ement of the galaxy ty pes is 
not as close to the order of De Vaucouleursl ( 19591) com¬ 
pared to the color images. C olor has been i dentif ied to 
correlate with galaxy types ( Strateva et al. . 200 ih . and 
therefore color information can contribute to the ability 
of the algorithm to analyze the similarities between dif¬ 
ferent types of galaxies. The probability to have the or¬ 
ders of the tree of the u, g, r, i, and z filters by chance is 
6.84-10-5, 6.84-10-5,1.9-10-®, 1.6-10-5, 6.84-10-5, 
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Figure 4: The network of similarities between the galaxy types using the u, g, r, i, and z bands of EFIGI 


respectively. 

Also, the analysis of the u band shows strong sepa¬ 
ration between late type galaxies and the other galaxy 
types, where Sa and Sab are positioned close to the 
lenticular galaxies. A similar observation can be made 
with the analysis of galaxies in the g band. The r, i, and 
z bands show a more even distribution of the types along 
the main axis, but it is also noticeable that the early 
types are clustered on one side of the axis, while irreg¬ 
ular and Sd galaxies are grouped close to each other on 
the other side. 

Figure|4]also shows that the early type galaxies could 
not be ordered correctly by the algorithm without us¬ 
ing the color images, and that the individual bands 
or grayscale images did not have sufficient morpho¬ 
logical features of these galaxy types that allow the 


automatic positioning in the same order proposed by 


De Vaucouleurs (1959). 


5 Conclusion 


Although gal axy classification canno t be considered a 
goal in itself (IDe Vaucouleursi 119941) . it is a key to un¬ 
derstanding the physical properties of the past, present, 
and future universe. Numerous galaxy morphological 
schemes have been proposed by manual observation 
and measurement of galaxy morphology and photom¬ 
etry. Here we proposed a computer-based approach to 
galaxy morphology by using an unsupervised machine 
learning system that can deduce the visual similarities 
between sets of images and reconstruct morphological 
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sequences of galaxies. The analysis is performed such 
that the algorithm determines the network of similari¬ 
ties between the different morphological classes auto¬ 
matically, and without human guidance. 

The results show that when using the color EFIGI 
galaxy images the sequence d educed by the computer 
is in large agreement with the De Vaucouleurs ( 19591) 
scheme, even when using the color images as gray-scale 
images. When using each band separ ately the deduced 
order was in weaker agreement with ( De Vaucouleursl 
1959I) . showing that the composite color images con¬ 


tained more visual information that was used by the al¬ 
gorithm to deduce the order of the morphological types. 
The saturation and gamma correction applied to the 
EFlGl color images as described in Section |2] could 
also affect the way these images were analyzed. Ba¬ 
sic statistical analysis shows very low probability of 
~ 1.53 • 10“^^ for having the galaxy types ordered in 
an ascending or descending order by mere chance. 

The color images allowed the algorithm to deduce 
a sequence that is mo re consistent with the order of 
De Vaucouleurs ( 1959h compared to the sequences pro¬ 


duced with each of the individual bands, indicating that 
the color images contained more information that was 
used by the algorithm to deduce the order of the mor¬ 
phological typess. 

One difference between the De Vaucouleurs ( 1959t) 
scheme and the network of morphological similarities 
produced by the algorithm is the SO galaxies, where 
the computer al gorithm did not find the exact same or¬ 
der identified bv lDe Vaucouleursl (Il959h . The ability of 
the computer to deduce a network of similarities that 
is largely in agreement with manual analysis demon¬ 
strates the discovery power of the method, and its poten¬ 
tial ability to analyze larger datasets containing a higher 
number of galaxy classes and identify and profile a pos¬ 
sible morphological sequence. That allows quantitative 
morphological of entire galaxies, rather than the quan¬ 
tification of individual identifiable morphological fea¬ 
tures (e.g., the number of spiral arms). 

While the experiments described in this paper are 
focused on galaxies in the Hubble sequence, with the 
increasing importance of digital sky surveys imaging 
billions of galaxies such as the Large Synoptic Survey 
Telescope (LSST), automated methods are also impor¬ 
tant to identify and analyze peculiar galaxies that can¬ 
not be associated with a defined morphological stage on 
the Hubble sequence. The scheme of numerical image 
content descriptors described in Section [3] has demon¬ 
strated its efficacy in detecting peculiar galaxy mergers 


among millions of galaxies in Sloan Digital Sky Sur¬ 
vey, and performing quantitative a ssessment of these 
mergers ( Shamir and Wallirl 2014ll . Sky surveys such 
as LSST will be able to image a much larger number of 
galaxies, from which peculiar galaxies can be detected . 
Automatic detection meth ods such as (IShamiii 1201 2al : 
Shamir and Wallin . 20141) can assist in the detection of 


peculiar galaxies that are not associated with stages on 
the Hubble sequence, and analysis methods such as the 
method described here can be used to identify links be¬ 
tween a large number of galaxy types. 

Source code of the analysis methods used in the ex¬ 
periment are publicly available, as well as the protocol 
as described inlAl 
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A Protocol 


All software tools used to produce the results are 
open source, making it easier to replicate t h e result s 
or analyze other 
Source code for 


(IShamir et al 


datasets (IShamir et all l2013d) . 
the computer analysis method 


chrm test -f0.15 -i#140 -jl2 -n20 -w 

p/path/to/phylip /path/to/ehgi_root_folder/ehgi.ht 

/path/to/efigi_root_folder/efigi.html 

When done, a .ps file should be created in the folder 

“/path/to/efigi_root_folder”. 


_ 2008al) is available at t h e Astro physics 

Source Code Library ( Shamir et all 2013bh or at 


6. To process the grayscale images step 4 should 
be replaced with the command line: ./wnd- 

http://vfacstaff.ltu.edu/lshamir/download s/ImageClassi her;hrm train -ml -S 100:25 /path/to/ehgi_root_folder 


as we ll 
2013bl) . 


the 


as its dependency libraries (IShamir et al 
It also requires the installation of 
open source PHYLIP package, available at 


http://evolution.genetics.washington.edu/phylip.html. 
The experiments also require computational resources 
that can process the EFIGI catalog. The experiment 
in this paper was done with a 16-core Intel Core-i7 
machine and 32GB of RAM, and took about three days 
to complete. 

To replicate the results, the following steps are 
required: 


/path/to/efigi_root_folder/efigi.fit 


The experiments were performed in Linux (Fedora) 
environment. For further information or assistance 
please contact the authors. 


1. Download the EFIGI catalog from 
http://www.astromatic.net/projects/efigi 

2. Convert the color FITS images (or PNG im¬ 
ages) to TIF format by using ImageMagick. A batch 
conversion can be done by the following command 
line: hnd /path/to/efigi -name “*.FITS” -exec convert 


3. Separate the images into folders such that the 
name of each folder is the T number, and its content 
is the galaxy images of that T number as annotated by 
EFIGI. 


4. Compute the image features by 

running the command line: ./wnd- 

chrm train -mlc /path/to/efigi_root_folder 
/path/to/ehgi_root_folder/efigi.fit 
That step might take several days to complete with 
a single core, but the response time can be shorter 
by running several instances of the process. The 
process should not be stopped to avoid the creation of 
empty .sig files. In case the process stopped before 
completion, the following command line should be 
used before running it again: hnd /path/to/ehgi -name 
“*.sig” -exec rm {} \; 


5. The phylogeny can be created by run¬ 
ning the following command line: ./wnd- 
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